-
Notifications
You must be signed in to change notification settings - Fork 125
Add results for KaLM-Team/KaLM_Embedding-X-0605 in MMTEB #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | KaLM-Team/KaLM-Embedding-X-0605 | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result |
|---|---|---|---|---|
| AILAStatutes | 0.47 | 0.49 | 0.21 | 0.01 |
| AfriSentiClassification | 0.54 | 0.54 | 0.46 | 0.01 |
| AlloProfClusteringS2S.v2 | 0.59 | 0.56 | 0.33 | 0.01 |
| AlloprofReranking | 0.81 | 0.82 | 0.69 | 0.01 |
| AmazonCounterfactualClassification | 0.88 | 0.88 | 0.70 | 0.92 |
| ArXivHierarchicalClusteringP2P | 0.64 | 0.65 | 0.56 | 0.01 |
| ArXivHierarchicalClusteringS2S | 0.64 | 0.64 | 0.54 | 0.01 |
| ArguAna | 0.58 | 0.86 | 0.54 | 0.64 |
| ArmenianParaphrasePC | 0.96 | 0.97 | 0.95 | 0.01 |
| BUCC.v2 | 0.99 | 0.99 | 0.99 | 0.01 |
| BelebeleRetrieval | 0.85 | 0.91 | 0.78 | 0.01 |
| BibleNLPBitextMining | 0.20 | 0.21 | 0.17 | 0.01 |
| BigPatentClustering.v2 | 0.46 | 0.38 | 0.31 | 0.00 |
| BiorxivClusteringP2P.v2 | 0.49 | 0.54 | 0.37 | 0.01 |
| BornholmBitextMining | 0.52 | 0.52 | 0.44 | 0.01 |
| BrazilianToxicTweetsClassification | 0.24 | 0.28 | 0.21 | 0.00 |
| BulgarianStoreReviewSentimentClassfication | 0.80 | 0.78 | 0.64 | 0.01 |
| CEDRClassification | 0.43 | 0.57 | 0.45 | 0.01 |
| CLSClusteringP2P.v2 | 0.47 | 0.43 | 0.40 | 0.01 |
| CSFDSKMovieReviewSentimentClassification | 0.57 | 0.49 | 0.35 | 0.01 |
| CTKFactsNLI | 0.83 | 0.88 | 0.80 | 0.01 |
| CataloniaTweetClassification | 0.48 | 0.55 | 0.50 | 0.01 |
| Core17InstructionRetrieval | 0.05 | 0.08 | -0.02 | 0.00 |
| CovidRetrieval | 0.82 | 0.79 | 0.76 | 0.01 |
| CyrillicTurkicLangClassification | 0.81 | 0.95 | 0.41 | 0.01 |
| CzechProductReviewSentimentClassification | 0.67 | 0.68 | 0.57 | 0.01 |
| DBpediaClassification | 0.95 | 0.95 | 0.88 | 0.01 |
| DalajClassification | 0.50 | 0.50 | 0.50 | 0.01 |
| DiaBlaBitextMining | 0.88 | 0.87 | 0.85 | 0.01 |
| EstonianValenceClassification | 0.68 | 0.54 | 0.43 | 0.01 |
| FaroeseSTS | 0.80 | 0.86 | 0.72 | 0.01 |
| FilipinoShopeeReviewsClassification | 0.50 | 0.48 | 0.35 | 0.01 |
| FinParaSTS | 0.26 | 0.29 | 0.25 | 0.00 |
| FinancialPhrasebankClassification | 0.94 | 0.89 | 0.84 | 0.01 |
| FloresBitextMining | 0.78 | 0.84 | 0.81 | 0.01 |
| GermanSTSBenchmark | 0.85 | 0.88 | 0.84 | 0.01 |
| GreekLegalCodeClassification | 0.42 | 0.44 | 0.37 | 0.01 |
| GujaratiNewsClassification | 0.92 | 0.92 | 0.77 | 0.01 |
| HALClusteringS2S.v2 | 0.32 | 0.32 | 0.23 | 0.00 |
| HagridRetrieval | 0.99 | 0.99 | 0.99 | 0.01 |
| IN22GenBitextMining | 0.92 | 0.94 | 0.77 | 0.01 |
| IndicCrosslingualSTS | 0.53 | 0.63 | 0.44 | 0.01 |
| IndicGenBenchFloresBitextMining | 0.96 | 0.97 | 0.89 | 0.01 |
| IndicLangClassification | 0.86 | 0.88 | 0.20 | 0.01 |
| IndonesianIdClickbaitClassification | 0.64 | 0.67 | 0.61 | 0.01 |
| IsiZuluNewsClassification | 0.33 | 0.41 | 0.32 | 0.00 |
| ItaCaseholdClassification | 0.68 | 0.73 | 0.67 | 0.01 |
| JSICK | 0.85 | 0.85 | 0.80 | 0.01 |
| KorHateSpeechMLClassification | 0.22 | 0.18 | 0.10 | 0.00 |
| KorSarcasmClassification | 0.66 | 0.61 | 0.57 | 0.01 |
| KurdishSentimentClassification | 0.83 | 0.86 | 0.77 | 0.01 |
| LEMBPasskeyRetrieval | 0.39 | 0.39 | 0.38 | 0.01 |
| LegalBenchCorporateLobbying | 0.94 | 0.96 | 0.90 | 0.01 |
| MIRACLRetrievalHardNegatives | 0.61 | 0.70 | 0.67 | 0.01 |
| MLQARetrieval | 0.81 | 0.84 | 0.76 | 0.01 |
| MacedonianTweetSentimentClassification | 0.72 | 0.72 | 0.62 | 0.01 |
| MalteseNewsClassification | 0.47 | 0.37 | 0.24 | 0.00 |
| MasakhaNEWSClassification | 0.84 | 0.84 | 0.78 | 0.01 |
| MasakhaNEWSClusteringS2S | 0.64 | 0.57 | 0.38 | 0.01 |
| MassiveIntentClassification | 0.77 | 0.82 | 0.60 | 0.85 |
| MedrxivClusteringP2P.v2 | 0.40 | 0.47 | 0.34 | 0.01 |
| MultiEURLEXMultilabelClassification | 0.05 | 0.05 | 0.05 | 0.00 |
| MultiHateClassification | 0.83 | 0.72 | 0.64 | 0.01 |
| NTREXBitextMining | 0.90 | 0.94 | 0.91 | 0.01 |
| NepaliNewsClassification | 0.98 | 0.98 | 0.88 | 0.01 |
| News21InstructionRetrieval | 0.02 | 0.10 | -0.00 | 0.00 |
| NollySentiBitextMining | 0.58 | 0.69 | 0.67 | 0.01 |
| NordicLangClassification | 0.72 | 0.86 | 0.80 | 0.01 |
| NorwegianCourtsBitextMining | 0.93 | 0.93 | 0.94 | 0.01 |
| NusaParagraphEmotionClassification | 0.47 | 0.56 | 0.42 | 0.01 |
| NusaTranslationBitextMining | 0.71 | 0.78 | 0.67 | 0.01 |
| NusaX-senti | 0.78 | 0.80 | 0.71 | 0.01 |
| NusaXBitextMining | 0.83 | 0.83 | 0.73 | 0.01 |
| OdiaNewsClassification | 0.85 | 0.92 | 0.80 | 0.01 |
| OpusparcusPC | 0.96 | 0.97 | 0.95 | 0.01 |
| PAC | 0.70 | 0.72 | 0.70 | 0.01 |
| PawsXPairClassification | 0.60 | 0.60 | 0.55 | 0.01 |
| PlscClusteringP2P.v2 | 0.75 | 0.74 | 0.72 | 0.01 |
| PoemSentimentClassification | 0.72 | 0.60 | 0.51 | 0.01 |
| PolEmo2.0-OUT | 0.76 | 0.78 | 0.36 | 0.01 |
| PpcPC | 0.94 | 0.96 | 0.92 | 0.01 |
| PunjabiNewsClassification | 0.84 | 0.83 | 0.81 | 0.01 |
| RTE3 | 0.89 | 0.90 | 0.88 | 0.01 |
| Robust04InstructionRetrieval | -0.01 | -0.02 | -0.07 | 0.00 |
| RomaniBibleClustering | 0.43 | 0.43 | 0.41 | 0.00 |
| RuBQReranking | 0.76 | 0.74 | 0.76 | 0.01 |
| SCIDOCS | 0.22 | 0.25 | 0.17 | 0.25 |
| SIB200ClusteringS2S | 0.45 | 0.42 | 0.24 | 0.00 |
| SICK-R | 0.81 | 0.83 | 0.80 | 0.82 |
| STS12 | 0.81 | 0.82 | 0.80 | 0.80 |
| STS13 | 0.86 | 0.90 | 0.82 | 0.89 |
| STS14 | 0.82 | 0.85 | 0.78 | 0.85 |
| STS15 | 0.88 | 0.90 | 0.89 | 0.89 |
| STS17 | 0.83 | 0.89 | 0.82 | 0.91 |
| STS22.v2 | 0.72 | 0.72 | 0.64 | 0.01 |
| STSB | 0.82 | 0.85 | 0.82 | 0.01 |
| STSBenchmark | 0.86 | 0.89 | 0.87 | 0.88 |
| STSES | 0.77 | 0.82 | 0.80 | 0.01 |
| ScalaClassification | 0.51 | 0.52 | 0.52 | 0.01 |
| SemRel24STS | 0.68 | 0.73 | 0.63 | 0.01 |
| SentimentAnalysisHindi | 0.79 | 0.76 | 0.64 | 0.01 |
| SinhalaNewsClassification | 0.80 | 0.82 | 0.67 | 0.01 |
| SiswatiNewsClassification | 0.52 | 0.62 | 0.54 | 0.01 |
| SlovakMovieReviewSentimentClassification | 0.94 | 0.90 | 0.74 | 0.01 |
| SpartQA | 0.10 | 0.10 | 0.06 | 0.00 |
| SprintDuplicateQuestions | 0.95 | 0.97 | 0.93 | 0.96 |
| StackExchangeClustering.v2 | 0.66 | 0.92 | 0.46 | 0.01 |
| StackOverflowQA | 0.93 | 0.97 | 0.89 | 0.01 |
| StatcanDialogueDatasetRetrieval | 0.47 | 0.51 | 0.11 | 0.01 |
| SwahiliNewsClassification | 0.65 | 0.66 | 0.60 | 0.01 |
| SwednClusteringP2P | 0.49 | 0.46 | 0.37 | 0.01 |
| SwissJudgementClassification | 0.55 | 0.58 | 0.54 | 0.01 |
| T2Reranking | 0.67 | 0.68 | 0.66 | 0.01 |
| TERRa | 0.57 | 0.64 | 0.58 | 0.01 |
| TRECCOVID | 0.85 | 0.86 | 0.71 | 0.85 |
| Tatoeba | 0.81 | 0.82 | 0.76 | 0.01 |
| TempReasonL1 | 0.02 | 0.03 | 0.01 | 0.00 |
| ToxicConversationsClassification | 0.86 | 0.89 | 0.66 | 0.87 |
| TswanaNewsClassification | 0.51 | 0.53 | 0.47 | 0.01 |
| TweetTopicSingleClassification | 0.78 | 0.71 | 0.65 | 0.01 |
| TwitterHjerneRetrieval | 0.79 | 0.98 | 0.35 | 0.01 |
| TwitterURLCorpus | 0.87 | 0.87 | 0.86 | 0.87 |
| VoyageMMarcoReranking | 0.71 | 0.67 | 0.68 | 0.01 |
| WebLINXCandidatesReranking | 0.16 | 0.11 | 0.08 | 0.00 |
| WikiCitiesClustering | 0.94 | 0.92 | 0.76 | 0.01 |
| WikiClusteringP2P.v2 | 0.32 | 0.28 | 0.26 | 0.00 |
| WikipediaRerankingMultilingual | 0.89 | 0.92 | 0.89 | 0.01 |
| WikipediaRetrievalMultilingual | 0.90 | 0.94 | 0.90 | 0.01 |
| WinoGrande | 0.47 | 0.61 | 0.55 | 0.01 |
| XNLI | 0.82 | 0.85 | 0.75 | 0.01 |
| indonli | 0.58 | 0.61 | 0.52 | 0.01 |
| Average | 0.66 | 0.68 | 0.59 | 0.10 |
|
@KennethEnevoldsen |
Checklist
mteb/models/this can be as an API. Instruction on how to add a model can be found here